{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# Evolving objects using the functional operators API of EvoTorch\n",
    "\n",
    "In this notebook, we show how to use the functional operators API of EvoTorch for tackling a problem with non-numeric solutions.\n",
    "\n",
    "In the problem we consider, the goal is to evolve parameter tensors of a feed-forward neural network to make a simulated `Ant-v4` MuJoCo robot walk forward.\n",
    "The feed-forward neural network policy has the following modules:\n",
    "\n",
    "- module 0: linear transformation (`torch.nn.Linear`) with a **weight** matrix and with a **bias** vector\n",
    "- module 1: tanh (`torch.nn.Tanh`)\n",
    "- module 2: linear transformation (`torch.nn.Linear`) with a **weight** matrix and with a **bias** vector\n",
    "\n",
    "In this problem, instead of a fixed-length vector consisting of real numbers, a solution is represented by a dictionary structured like this:\n",
    "\n",
    "```\n",
    "{\n",
    "    \"0.weight\": [ ... list of seeds ... ],\n",
    "    \"0.bias\": [ ... list of seeds ... ],\n",
    "    \"2.weight\": [ ... list of seeds ... ],\n",
    "    \"2.bias\": [ ... list of seeds ... ],\n",
    "}\n",
    "```\n",
    "\n",
    "where each key is a name referring to a parameter tensor. Associated with each key is a list of integers (integers being random seeds). At the moment of decoding a solution, each parameter tensor (e.g. `\"0.weight\"`) is constructed by sampling a Gaussian noise using each seed, and then by summing those Gaussian noises (as was done in `[1]` and `[2]`).\n",
    "\n",
    "**Note 1:** Although this example is inspired by the studies `[1]` and `[2]`, it is not a faithful implementation of any them. Instead, this notebook focuses on demonstrating various features of the functional operators API of EvoTorch.\n",
    "\n",
    "**Note 2:** For the sake of simplicity, the action space of `Ant-v4` is binned. With this simplification and with its default hyperparameters, this example evolutionary algorithm is able to find gaits for the ant robot with a relatively small population size, although the evolved gaits will not be very efficient (i.e. non-competitive cumulative rewards).\n",
    "\n",
    "---\n",
    "\n",
    "`[1]` Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O. Stanley, Jeff Clune (2017). \"Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning.\" arXiv preprint arXiv:1712.06567.\n",
    "\n",
    "`[2]` Risi, Sebastian, and Kenneth O. Stanley (2019). \"Deep neuroevolution of recurrent and discrete world models.\" Proceedings of the Genetic and Evolutionary Computation Conference.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1",
   "metadata": {},
   "source": [
    "## Summary of the evolutionary algorithm\n",
    "\n",
    "We implement a simple, elitist genetic algorithm with tournament selection, cross-over, and mutation operators. The main ideas of this genetic algorithm are as follows.\n",
    "\n",
    "**Generation of a new solution:**\n",
    "- Make a new dictionary.\n",
    "- Associated with each key (parameter name) within the dictionary, make a single-element list of seeds, the seed within it being a random integer.\n",
    "\n",
    "**Cross-over between two solutions.**\n",
    "- Make two children solutions (dictionaries).\n",
    "- For each key (parameter name):\n",
    "    - Sample a real number $p$ between 0 and 1.\n",
    "        - If $p < 0.5$, the first child receives its list of seeds from the first parent, the second child receives its list of seeds from the second parent.\n",
    "        - Otherwise, the first child receives its list of seeds from the second parent, the second child receives its list of seeds from the first parent.\n",
    "\n",
    "**Mutation of an existing solution.**\n",
    "- Pick a key (parameter name) within the solution (dictionary).\n",
    "- Randomly sample a new integer, and add this integer into the list of seeds associated with the picked key."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2",
   "metadata": {},
   "source": [
    "## Implementation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3",
   "metadata": {},
   "outputs": [],
   "source": [
    "from evotorch import Problem, Solution\n",
    "from evotorch.tools import make_tensor, ObjectArray\n",
    "import evotorch.operators.functional as func_ops\n",
    "\n",
    "import gymnasium as gym\n",
    "import numpy as np\n",
    "\n",
    "import torch\n",
    "from torch import nn\n",
    "from torch.func import functional_call\n",
    "\n",
    "from typing import Iterable, Mapping, Optional, Union\n",
    "import random\n",
    "import os\n",
    "from datetime import datetime\n",
    "import pickle"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4",
   "metadata": {},
   "source": [
    "The function below takes a series of seeds, and makes a tensor of real numbers out of them.\n",
    "We will use this function at the moment of decoding a solution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5",
   "metadata": {},
   "outputs": [],
   "source": [
    "def make_tensor_from_seeds(\n",
    "    like: torch.Tensor,\n",
    "    seeds: Iterable,\n",
    "    *,\n",
    "    mutation_power: float,\n",
    "    mutation_decay: float,\n",
    "    min_mutation_power: float,\n",
    ") -> torch.Tensor:\n",
    "    \"\"\"\n",
    "    Take a series of seeds and compute a tensor out of them.\n",
    "\n",
    "    Args:\n",
    "        like: A source tensor. The resulting tensor will have the same shape,\n",
    "            dtype, and device with this source tensor.\n",
    "        seeds: An iterable in which each item is an integer, each integer\n",
    "            being a random seed.\n",
    "        mutation_power: A multiplier for the Gaussian noise generated out of\n",
    "            a random seed.\n",
    "        mutation_decay: For each seed, the mutation power will be multiplied\n",
    "            by this factor. For example, if this multiplier is 0.9, the power\n",
    "            of the mutation will be decreased with each seed, as that power\n",
    "            will be diminished by getting multiplied with 0.9.\n",
    "        min_mutation_power: To prevent the mutation power from getting to\n",
    "            close to 0, provide a lower bound multiplier via this argument.\n",
    "    Returns:\n",
    "        The tensor generated from the given seeds.\n",
    "    \"\"\"\n",
    "    from numpy.random import RandomState\n",
    "\n",
    "    result = torch.zeros_like(like)\n",
    "    for i_seed, seed in enumerate(seeds):\n",
    "        multiplier = max(mutation_power * (mutation_decay ** i_seed), min_mutation_power)\n",
    "        result += (\n",
    "            multiplier * torch.as_tensor(RandomState(seed).randn(*(like.shape)), dtype=like.dtype, device=like.device)\n",
    "        )\n",
    "\n",
    "    return result"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6",
   "metadata": {},
   "source": [
    "Helper function to generate a random seed integer:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7",
   "metadata": {},
   "outputs": [],
   "source": [
    "def sample_seed() -> int:\n",
    "    return random.randint(0, (2 ** 32) - 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8",
   "metadata": {},
   "source": [
    "**Observation normalization.**\n",
    "Below, we have helper functions that will generate observation data for the reinforcement learning environment at hand.\n",
    "The observation data will be used for normalizing the observations before passing them to the policy neural network."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9",
   "metadata": {},
   "outputs": [],
   "source": [
    "def env_name_to_file_name(env_name: str) -> str:\n",
    "    \"\"\"\n",
    "    Convert the gymnasium environment ID to a more file-name-friendly counterpart.\n",
    "\n",
    "    The character ':' in the input string will be replaced with '__colon__'.\n",
    "    Similarly, the character '/' in the input string will be replaced with '__slash__'.\n",
    "\n",
    "    Args:\n",
    "        env_name: gymnasium environment ID\n",
    "    Returns:\n",
    "        File-name-friendly counterpart of the input string.\n",
    "    \"\"\"\n",
    "    result = env_name\n",
    "    result = result.replace(\":\", \"__colon__\")\n",
    "    result = result.replace(\"/\", \"__slash__\")\n",
    "    return result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10",
   "metadata": {},
   "outputs": [],
   "source": [
    "def create_obs_data(\n",
    "    *,\n",
    "    env_name: str,\n",
    "    num_timesteps: int,\n",
    "    report_interval: Union[int, float] = 5,\n",
    "    seed: int = 0,\n",
    ") -> tuple:\n",
    "    \"\"\"\n",
    "    Create observation normalization data with the help of random actions.\n",
    "\n",
    "    This function creates a gymnasium environment from the given `env_name`.\n",
    "    Then, it keeps sending random actions to this environment, and collects stats from the observations.\n",
    "\n",
    "    Args:\n",
    "        env_name: ID of the gymnasium environment\n",
    "        num_timesteps: For how many timesteps will the function operate on the environment\n",
    "        report_interval: Time interval, in seconds, for reporting the status\n",
    "        seed: A seed that will be used for regulating the randomness of both the environment\n",
    "            and of the random actions.\n",
    "    Returns:\n",
    "        A tuple of the form `(mean, stdev)`, where `mean` is the elementwise mean of the observation vectors,\n",
    "        and `stdev` is the elementwise standard deviation of the observation vectors.\n",
    "    \"\"\"\n",
    "    print(\"Creating observation data for\", env_name)\n",
    "\n",
    "    class accumulated:\n",
    "        sum: Optional[np.ndarray] = None\n",
    "        sum_of_squares: Optional[np.ndarray] = None\n",
    "        count: int = 0\n",
    "\n",
    "    def accumulate(obs: np.ndarray):\n",
    "        if accumulated.sum is None:\n",
    "            accumulated.sum = obs.copy()\n",
    "        else:\n",
    "            accumulated.sum += obs\n",
    "\n",
    "        squared = obs ** 2\n",
    "        if accumulated.sum_of_squares is None:\n",
    "            accumulated.sum_of_squares = squared\n",
    "        else:\n",
    "            accumulated.sum_of_squares += squared\n",
    "\n",
    "        accumulated.count += 1\n",
    "\n",
    "    rndgen = np.random.RandomState(seed)\n",
    "\n",
    "    env = gym.make(env_name)\n",
    "    assert isinstance(env.action_space, gym.spaces.Box), \"Can only work with Box action spaces\"\n",
    "\n",
    "    def reset_env() -> tuple:\n",
    "        return env.reset(seed=rndgen.randint(2 ** 32))\n",
    "\n",
    "    action_gap = env.action_space.high - env.action_space.low\n",
    "    def sample_action() -> np.ndarray:\n",
    "        return (rndgen.rand(*(env.action_space.shape)) * action_gap) + env.action_space.low\n",
    "\n",
    "    observation, _ = reset_env()\n",
    "    accumulate(observation)\n",
    "\n",
    "    last_report_time = datetime.now()\n",
    "\n",
    "    for t in range(num_timesteps):\n",
    "        action = sample_action()\n",
    "        observation, _, terminated, truncated, _ = env.step(action)\n",
    "        accumulate(observation)\n",
    "\n",
    "        done = terminated | truncated\n",
    "        if done:\n",
    "            observation, info = reset_env()\n",
    "            accumulate(observation)\n",
    "\n",
    "        tnow = datetime.now()\n",
    "        if (tnow - last_report_time).total_seconds() > report_interval:\n",
    "            print(\"Number of timesteps:\", t, \"/\", num_timesteps)\n",
    "            last_report_time = tnow\n",
    "\n",
    "    E_x = accumulated.sum / accumulated.count\n",
    "    E_x2 = accumulated.sum_of_squares / accumulated.count\n",
    "\n",
    "    mean = E_x\n",
    "    variance = np.maximum(E_x2 - ((E_x) ** 2), 1e-2)\n",
    "    stdev = np.sqrt(variance)\n",
    "\n",
    "    print(\"Done.\")\n",
    "    \n",
    "    return mean, stdev"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_obs_data(env_name: str, num_timesteps: int = 50000, seed: int = 0) -> tuple:\n",
    "    \"\"\"\n",
    "    Generate observation normalization data for the gymnasium environment whose name is given.\n",
    "\n",
    "    If such normalization data was already generated and saved into a pickle file, that pickle file will be loaded.\n",
    "    Otherwise, new normalization data will be generated and saved into a new pickle file.\n",
    "\n",
    "    Args:\n",
    "        env_name: ID of the gymnasium environment\n",
    "        num_timesteps: For how many timesteps will the observation collector operate on the environment\n",
    "        seed: A seed that will be used for regulating the randomness of both the environment\n",
    "            and of the random actions.\n",
    "    Returns:\n",
    "        A tuple of the form `(mean, stdev)`, where `mean` is the elementwise mean of the observation vectors,\n",
    "        and `stdev` is the elementwise standard deviation of the observation vectors.\n",
    "    \"\"\"\n",
    "    num_timesteps = int(num_timesteps)\n",
    "    envfname = env_name_to_file_name(env_name)\n",
    "    fname = f\"obs_seed{seed}_t{num_timesteps}_{envfname}.pickle\"\n",
    "    if os.path.isfile(fname):\n",
    "        with open(fname, \"rb\") as f:\n",
    "            return pickle.load(f)\n",
    "    else:\n",
    "        obsdata = create_obs_data(env_name=env_name, num_timesteps=num_timesteps, seed=seed)\n",
    "        with open(fname, \"wb\") as f:\n",
    "            pickle.dump(obsdata, f)\n",
    "        return obsdata"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12",
   "metadata": {},
   "source": [
    "**Problem definition.**\n",
    "Below is the problem definition for the considered reinforcement learning task.\n",
    "We are defining the problem as a subclass of `evotorch.Problem`, so that we will be able to use ray-based parallelization capabilities of the base `Problem` class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "13",
   "metadata": {},
   "outputs": [],
   "source": [
    "class MyRLProblem(Problem):\n",
    "    def __init__(\n",
    "        self,\n",
    "        *,\n",
    "        env_name: str,\n",
    "        obs_mean: Optional[np.ndarray] = None,\n",
    "        obs_stdev: Optional[np.ndarray] = None,\n",
    "        mutation_power: float = 0.5,\n",
    "        mutation_decay: float = 0.9,\n",
    "        min_mutation_power: float = 0.05,\n",
    "        hidden_sizes: tuple = (64,),\n",
    "        bin: Optional[float] = 0.2,\n",
    "        num_episodes: int = 4,\n",
    "        episode_length: Optional[int] = None,\n",
    "        decrease_rewards_by: Optional[float] = 1.0,\n",
    "        num_actors: Optional[Union[int, str]] = \"max\"\n",
    "    ):\n",
    "        super().__init__(\n",
    "            objective_sense=\"max\",\n",
    "            dtype=object,\n",
    "            num_actors=num_actors,\n",
    "        )\n",
    "        self._env_name = str(env_name)\n",
    "        self._env = None\n",
    "        self._hidden_sizes = [int(hidden_size) for hidden_size in hidden_sizes]\n",
    "        self._policy = None\n",
    "\n",
    "        self._obs_mean = None if obs_mean is None else np.asarray(obs_mean).astype(\"float32\")\n",
    "        self._obs_stdev = None if obs_mean is None else np.asarray(obs_stdev).astype(\"float32\")\n",
    "        self._mutation_power = float(mutation_power)\n",
    "        self._mutation_decay = float(mutation_decay)\n",
    "        self._min_mutation_power = float(min_mutation_power)\n",
    "        self._bin = None if bin is None else float(bin)\n",
    "        self._num_episodes = int(num_episodes)\n",
    "        self._episode_length = None if episode_length is None else int(episode_length)\n",
    "        self._decrease_rewards_by = None if decrease_rewards_by is None else float(decrease_rewards_by)\n",
    "\n",
    "    def _get_policy(self) -> nn.Module:\n",
    "        env = self._get_env()\n",
    "\n",
    "        if not isinstance(env.observation_space, gym.spaces.Box):\n",
    "            raise TypeError(\n",
    "                f\"Only Box-typed environments are supported. Encountered observation space is {env.observation_space}\"\n",
    "            )\n",
    "\n",
    "        [obslen] = env.observation_space.shape\n",
    "        if isinstance(env.action_space, gym.spaces.Box):\n",
    "            [actlen] = env.action_space.shape\n",
    "        elif isinstance(env.action_space, gym.spaces.Discrete):\n",
    "            actlen = env.action_space.n\n",
    "        else:\n",
    "            raise TypeError(f\"Unrecognized action space: {env.action_space}\")\n",
    "\n",
    "        all_sizes = [obslen]\n",
    "        all_sizes.extend(self._hidden_sizes)\n",
    "        all_sizes.append(actlen)\n",
    "\n",
    "        last_size_index = len(all_sizes) - 1\n",
    "\n",
    "        modules = []\n",
    "        for i in range(1, len(all_sizes)):\n",
    "            modules.append(nn.Linear(all_sizes[i - 1], all_sizes[i]))\n",
    "            if i < last_size_index:\n",
    "                modules.append(nn.Tanh())\n",
    "\n",
    "        return nn.Sequential(*modules)\n",
    "\n",
    "    def _get_env(self, visualize: bool = False) -> gym.Env:\n",
    "        if visualize:\n",
    "            return gym.make(self._env_name, render_mode=\"human\")\n",
    "\n",
    "        if self._env is None:\n",
    "            self._env = gym.make(self._env_name)\n",
    "        return self._env\n",
    "\n",
    "    def _generate_single_solution(self) -> dict:\n",
    "        policy = self._get_policy()\n",
    "        result = {}\n",
    "        for param_name, params in policy.named_parameters():\n",
    "            result[param_name] = [sample_seed()]\n",
    "        return result\n",
    "\n",
    "    def generate_values(self, n: int) -> ObjectArray:\n",
    "        return make_tensor([self._generate_single_solution() for _ in range(n)], dtype=object)\n",
    "\n",
    "    def run_solution(\n",
    "        self,\n",
    "        x: Union[Mapping, Solution],\n",
    "        *,\n",
    "        num_episodes: Optional[int] = None,\n",
    "        visualize: bool = False\n",
    "    ) -> float:\n",
    "        if num_episodes is None:\n",
    "            num_episodes = self._num_episodes\n",
    "\n",
    "        if isinstance(x, Mapping):\n",
    "            sln = x\n",
    "        elif isinstance(x, Solution):\n",
    "            sln = x.values\n",
    "        else:\n",
    "            raise TypeError(f\"Expected a Mapping or a Solution, but got {repr(x)}\")\n",
    "\n",
    "        policy = self._get_policy()\n",
    "\n",
    "        params = {}\n",
    "        for param_name, param_values in policy.named_parameters():\n",
    "            param_seeds = sln[param_name]\n",
    "            params[param_name] = make_tensor_from_seeds(\n",
    "                param_values,\n",
    "                param_seeds,\n",
    "                mutation_power=self._mutation_power,\n",
    "                mutation_decay=self._mutation_decay,\n",
    "                min_mutation_power=self._mutation_power,\n",
    "            )\n",
    "\n",
    "        env = self._get_env(visualize=visualize)\n",
    "\n",
    "        def use_policy(policy_input: np.ndarray) -> Union[int, np.ndarray]:\n",
    "            if (self._obs_mean is not None) and (self._obs_stdev is not None):\n",
    "                policy_input = policy_input - self._obs_mean\n",
    "                policy_input = policy_input / self._obs_stdev\n",
    "\n",
    "            result = functional_call(policy, params, torch.as_tensor(policy_input, dtype=torch.float32)).numpy()\n",
    "\n",
    "            if isinstance(env.action_space, gym.spaces.Box):\n",
    "                if self._bin is not None:\n",
    "                    result = np.sign(result) * self._bin\n",
    "                result = np.clip(result, env.action_space.low, env.action_space.high)\n",
    "            elif isinstance(env.action_space, gym.spaces.Discrete):\n",
    "                result = int(np.argmax(result))\n",
    "            else:\n",
    "                raise TypeError(f\"Unrecognized action space: {repr(env.action_space)}\")\n",
    "\n",
    "            return result\n",
    "\n",
    "        cumulative_reward = 0.0\n",
    "\n",
    "        for _ in range(num_episodes):\n",
    "            timestep = 0\n",
    "            observation, info = env.reset()\n",
    "            while True:\n",
    "                action = use_policy(observation)\n",
    "                observation, reward, done1, done2, _ = env.step(action)\n",
    "                timestep += 1\n",
    "                if (self._decrease_rewards_by is not None) and (not visualize):\n",
    "                    reward = reward - self._decrease_rewards_by\n",
    "                cumulative_reward += reward\n",
    "                if (\n",
    "                    done1\n",
    "                    or done2\n",
    "                    or (\n",
    "                        (not visualize)\n",
    "                        and (self._episode_length is not None)\n",
    "                        and (timestep >= self._episode_length)\n",
    "                    )\n",
    "                ):\n",
    "                    break\n",
    "\n",
    "        return cumulative_reward / num_episodes\n",
    "\n",
    "    def visualize(self, x: Union[Solution, Mapping]) -> float:\n",
    "        return self.run_solution(x, num_episodes=1, visualize=True)\n",
    "    \n",
    "    def _evaluate(self, x: Solution):\n",
    "        x.set_evaluation(self.run_solution(x))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14",
   "metadata": {},
   "source": [
    "We now define our mutation and cross-over operators, via the functions `mutate` and `cross_over`.\n",
    "Since the solutions are expressed via dictionary-like objects, we use `Mapping` for type annotations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15",
   "metadata": {},
   "outputs": [],
   "source": [
    "def mutate(solution: Mapping) -> Mapping:\n",
    "    from evotorch.tools import as_immutable\n",
    "\n",
    "    solution = {k: list(v) for k, v in solution.items()}\n",
    "\n",
    "    keys = list(solution.keys())\n",
    "    chosen_key = random.choice(keys)\n",
    "    solution[chosen_key].append(sample_seed())\n",
    "\n",
    "    return as_immutable(solution)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16",
   "metadata": {},
   "outputs": [],
   "source": [
    "def cross_over(parent1: Mapping, parent2: Mapping) -> tuple:\n",
    "    from evotorch.tools import as_immutable\n",
    "\n",
    "    keys = list(parent1.keys())\n",
    "\n",
    "    child1 = {}\n",
    "    child2 = {}\n",
    "    for k in keys:\n",
    "        p = random.random()\n",
    "        if p < 0.5:\n",
    "            child1[k] = parent1[k]\n",
    "            child2[k] = parent2[k]\n",
    "        else:\n",
    "            child1[k] = parent2[k]\n",
    "            child2[k] = parent1[k]\n",
    "\n",
    "    return as_immutable(child1), as_immutable(child2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17",
   "metadata": {},
   "source": [
    "ID of the considered reinforcement learning task:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18",
   "metadata": {},
   "outputs": [],
   "source": [
    "ENV_NAME = \"Ant-v4\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19",
   "metadata": {},
   "source": [
    "Generate or load observation data for the considered reinforcement learning environment:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20",
   "metadata": {},
   "outputs": [],
   "source": [
    "env_obs_mean, env_obs_stdev = get_obs_data(ENV_NAME)\n",
    "env_obs_mean, env_obs_stdev"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21",
   "metadata": {},
   "source": [
    "Instantiate the problem object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22",
   "metadata": {},
   "outputs": [],
   "source": [
    "problem = MyRLProblem(\n",
    "    env_name=ENV_NAME,\n",
    "    decrease_rewards_by=1.0,\n",
    "    episode_length=250,\n",
    "    bin=0.15,\n",
    "    obs_mean=env_obs_mean,\n",
    "    obs_stdev=env_obs_stdev,\n",
    ")\n",
    "\n",
    "problem"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23",
   "metadata": {},
   "source": [
    "Out of the instantiated problem object, we make a callable evaluator named `f`.\n",
    "The resulting object `f` can be used as a fitness function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "24",
   "metadata": {},
   "outputs": [],
   "source": [
    "f = problem.make_callable_evaluator()\n",
    "f"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25",
   "metadata": {},
   "source": [
    "Helper function for converting a real number to a string.\n",
    "We will use this while reporting the status of the evolution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "26",
   "metadata": {},
   "outputs": [],
   "source": [
    "def number_to_str(x) -> str:\n",
    "    return \"%.2f\" % float(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27",
   "metadata": {},
   "source": [
    "Hyperparameters and constants for the evolutionary algorithm:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "28",
   "metadata": {},
   "outputs": [],
   "source": [
    "popsize = 16\n",
    "tournament_size = 4\n",
    "objective_sense = problem.objective_sense\n",
    "num_generations = 100"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29",
   "metadata": {},
   "source": [
    "We now prepare the initial population.\n",
    "When we are dealing with non-numeric solutions, a population is represented via `evotorch.tools.ObjectArray`, instead of `torch.Tensor`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "30",
   "metadata": {},
   "outputs": [],
   "source": [
    "population = problem.generate_values(popsize)\n",
    "population"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31",
   "metadata": {},
   "source": [
    "Evaluate the fitnesses of the solutions within the initial population:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "32",
   "metadata": {},
   "outputs": [],
   "source": [
    "evals = f(population)\n",
    "evals"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "33",
   "metadata": {},
   "source": [
    "Main loop of the evolutionary search:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "34",
   "metadata": {},
   "outputs": [],
   "source": [
    "for generation in range(1, 1 + num_generations):\n",
    "    t_begin = datetime.now()\n",
    "\n",
    "    # Apply tournament selection on the population\n",
    "    parent1_indices, parent2_indices = func_ops.tournament(\n",
    "        population,\n",
    "        evals,\n",
    "        tournament_size=tournament_size,\n",
    "        num_tournaments=popsize,\n",
    "        split_results=True,\n",
    "        return_indices=True,\n",
    "        objective_sense=objective_sense,\n",
    "    )\n",
    "\n",
    "    # The results of the tournament selection are stored within the integer\n",
    "    # tensors `parent1_indices` and `parent2_indices`.\n",
    "    # The pairs of solutions for the cross-over operator are:\n",
    "    # - `population[parent1_indices[0]]` and `population[parent2_indices[0]]`,\n",
    "    # - `population[parent1_indices[1]]` and `population[parent2_indices[1]]`,\n",
    "    # - `population[parent1_indices[2]]` and `population[parent2_indices[2]]`,\n",
    "    # - and so on...\n",
    "    num_pairs = len(parent1_indices)\n",
    "    children = []\n",
    "    for i in range(num_pairs):\n",
    "        parent1_index = int(parent1_indices[i])\n",
    "        parent2_index = int(parent2_indices[i])\n",
    "        child1, child2 = cross_over(population[parent1_index], population[parent2_index])\n",
    "        child1 = mutate(child1)\n",
    "        child2 = mutate(child2)\n",
    "        children.extend([child1, child2])\n",
    "\n",
    "    # With the help of the function `evotorch.tools.make_tensor(...)`,\n",
    "    # we convert the list of child solutions to an ObjectArray, so that\n",
    "    # `children` can be treated as a population of solutions.\n",
    "    children = make_tensor(children, dtype=object)\n",
    "\n",
    "    # Combine the original population with the population of children,\n",
    "    # forming an extended population.\n",
    "    extended_population = func_ops.combine(population, children)\n",
    "\n",
    "    # Evaluate all the solutions within the extended population.\n",
    "    extended_evals = f(extended_population)\n",
    "\n",
    "    # Take the best `popsize` number of solutions from the extended population.\n",
    "    population, evals = func_ops.take_best(\n",
    "        extended_population, extended_evals, popsize, objective_sense=objective_sense\n",
    "    )\n",
    "\n",
    "    t_end = datetime.now()\n",
    "    time_taken = (t_end - t_begin).total_seconds()\n",
    "\n",
    "    # Report the status of the evolutionary search.\n",
    "    print(\n",
    "        \"Generation:\", generation,\n",
    "        \"  Mean eval:\", number_to_str(evals.mean()),\n",
    "        \"  Pop best:\", number_to_str(evals.max()),\n",
    "        \"  Time:\", number_to_str(time_taken)\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35",
   "metadata": {},
   "source": [
    "Take the index of the best solution within the last population:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "36",
   "metadata": {},
   "outputs": [],
   "source": [
    "best_index = torch.argmax(evals)\n",
    "best_index"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "37",
   "metadata": {},
   "source": [
    "Take the best solution within the last population:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "38",
   "metadata": {},
   "outputs": [],
   "source": [
    "best_params = population[best_index]\n",
    "best_params"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39",
   "metadata": {},
   "source": [
    "Visualize the gait of the population's best solution:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "40",
   "metadata": {},
   "outputs": [],
   "source": [
    "problem.visualize(best_params)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
